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1  Introduction 


Measurement  error  models  commonly  begin  with  an  underlying  model  where  one  or 
more  of  the  independent  variables  are  measured  with  error.  The  distinguishing  feature 
of  a  measurement  error  problem  is  that  we  cannot  observe  those  variables  which  are 
measured  with  error  directly.  The  goal  of  measurement  error  modeling  is  to  obtain 
understanding  from  the  model.  Attainment  of  this  goal  requires  careful  analysis. 

This  selection  problem  is  from  statistical  consulting.  When  I  (Xun  Lin)  was  a  statis¬ 
tical  consultant  in  the  summer  of  1996,  one  of  my  clients  came  up  with  a  problem  which 
can  be  simplified  as  follows. 

Suppose  we  have  k  treatments  11, -,2  =  1,. . .  ,k  and  n  observations  from  each  treat¬ 
ment.  For  each  treatment  II,-,  i  =  1,. . .  ,k  and  each  observation  j  =  1, . . . ,  n,  we  have 
the  following  model: 


—  ^Oi  +  PliXij  +  Cij,  Wij  =  Xij  -f-  Uij,  (1) 

where  {{Xij,Uij,eij),l  <  j  <  n}  are  independently  but  not  necessarily  identically 
distributed  random  vectors  with  means  (0,0,0)  and  variances  {crxxi,o'uuii<^eei)-  For 
i  =  1,...,A:  and  j  =  l,...,n,  {Xij,Uij,eij)  are  independent  to  each  other.  But  Xij 
cannot  be  observed,  instead  we  can  only  observe  (Wij,  Yij).  We  assume  that  for  each  i, 
^  uui  is  known  and  a^xi  >  0. 

A  treatment  II,-  is  said  to  be  the  best  if  the  associated  slope  parameter  is  the 
largest  among  the  k  slope  parameters,  otherwise  the  treatment  is  said  to  be  nonbest. 
The  goal  of  this  selection  problem  is  to  select  the  best  treatment  from  the  k  treatments. 

Let  n  =  /3i2,  . . . , ^  R,i  =  1, . . . ,  A:}  be  the  parameter  space.  Let 

a  =  (fli, . . . , Ofc)  be  an  action,  where  Uj  =  0, 1,  i  =  1, . . .  ,k.  When  action  a  is  taken, 
Oi  =  1  means  that  treatment  Hi  is  selected  cis  the  best  treatment;  otherwise  a,-  =  0  and  II,- 
is  excluded  as  the  nonbest.  For  i  =  l,...,k,  let  W)  =  (Wn, . . . ,  Win),  Yi  =  (Yn, . . . ,  Yin), 
X  =  (Xi, . . . ,  Xk),  and  Y  =  (Fi, . . . ,  Yk).  Let  x  be  the  sample  space  generated  by  (W,  Y). 
Since  the  true  order  of  ^n, . . .  ,^ik  is  unknown,  we  denote  <  ^i[2]  <  •  •  •  <  0i[k]-  For 
simplicity,  we  assume  that  0i[k]  —  I3i[k-i]  =  26  >  0. 

A  selection  rule  d(w,  y)  =  (di(w, y), . . . ,  dk(w, y))  is  a  mapping  defined  on  x,  where 
<^t(w,y)  is  the  probability  that  given  W  =  w  and  Y  =  y,  Hi  is  selected  as  the  best.  Also, 
di{w,y)  =  1,  for  all  (w,y)  G  X- 

We  consider  the  following  loss  function: 


if  a  nonbest  treatment  is  selected, 
if  the  best  treatment  is  selected. 


(2) 
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2  Formulation  of  the  Selection  Procedure 


The  population  moments  of  {Wij,Yij)  satisfy 


(3) 


and 


'^wyii  ^yyi)  —  {^xxi  4*  ^uuii  ^li^^xxii  ^xxi  4"  <^e£»)‘ 


(4) 


The  sample  means  {Wi,Y)  and  the  sample  covariates  {SwwiTSwyi^Syyi),  where,  for  ex¬ 
ample, 


w,=  ^-±w,,, 


”r=i 


(5) 


=  (6) 

”  •*•  i=l 

will  be  the  bcises  of  our  selection  procedure. 

We  use  estimators  of  the  unknown  parameters  by  replacing  the  unknown  population 
moments  with  their  sample  estimators.  For  the  quantities  defined  above  to  be  proper 
estimators,  a^xi  must  be  positive.  We  have  (Txxi  =  —  (^uui  when  S^wi  —  <^uui  is 

positive,  otherwise  we  set  a^xi  =  We  also  have 

n  _  J  ^^wwi  ^wy^^  If  ^uui  >  0,  _ 

~  [  S~yiSyyi,  otherwise. 

Since  we  take  n  samples  from  each  treatment,  our  selection  procedure  will  be  d„(w,  y)  = 
(din(w,  y),  d2n(w,  y), . . . ,  4n(w,  y)),  where 


d,n(w,y) 


1,  if  j3ii  is  the  largest  among  the  k  slope  estimates, 
0,  otherwise. 


when  W  =  w  and  Y  =  y  are  observed. 


(8) 
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3  Performance  of  the  Selection  Procedures 


In  this  section  we  study  the  performance  of  the  selection  procedures.  We  first  analyze 
the  expected  risk  of  the  proposed  procedure. 

Definition  1.  A  sequence  of  selection  procedures  {dn(w,y)}^j  is  said  to  be  asymp¬ 
totically  optimal  of  order  Cn  if  d„(w,  y))  =  0(e„),  where  e„  is  a  sequence  of 

positive  numbers  such  that  lim„_*co  Cn  =  0. 

Denote  Pn  to  be  the  probability  measure  generated  by  the  random  observations 
(WiY),  and  for  each  (w,y)  €  Xi  I®* 


and 


(9) 


l<j<k 


Then,  the  expected  risk  of  the  proposed  selection  procedure  is 


(10) 


H  H  Pn{i*  =i,i*n=  j} 

i=l 


j,S^ 


5—1  j  = 

E  E  Pn{i-  =  i,i; 

2=1 

+E  E  i’nK  =  >.>:  =  ; 

i=l  jzzl 


't  ^  ^xxi  q  ^  ^xxj  ^ 

^wwi  ^uui  ^  2  ’  ^wwj  ^uuj  ^  2  J 


^  C  ^  ^  ^xxi  *1 

Jj  ^wwi  ^uui  ^  2  J 


k  k 

E  E 

1=1  j=lj?i 


Pn{i 


‘  O 

_  z  —  —  J, 


^wwj  ^uuj  ^  2  J 


:  E 

L  j=l 


-  >  6,  S, 


y  ^  (^xxi  q 

^wwi  ^uui  ^  2  ’  ^wwj 


^xxj 


■} 


I  Pn{^\j  -  !d\i  >  S, 

J  =  1 . 13^2 


A;  k 

C  E 

’“■*  i=iJ7^2 

XJ  -Pni-Sw  -  CTuu,-  <  ^ 


(11) 


I'  n  ^  ^xxi  q  ^  ^ 

h  ^wwi  ^uui  ^  2  ’  ^wwj  ^uuj  ^  2 
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k  k 


+  X]  XI  Pn{Sviwj  -<^uuj  < 

i=l  j=l,jjii  ^ 


k  k 


<  X]  X!  Pn{l3li  -  ^li  >  6,  -  (Tuui  >  -^} 

t=l  j=l 


k  k 


+  Pn{/3ij  —  (3ij  >  <5, S^yjj  —  auuj  > 

2=1 j=lj^i 


+2A:  Pn{Swwi  —  <^uui  ^ 

2  =  1  ^ 


<  ^  Pn{|/9li  -  Ail  >  ■S'w  -  cr„ui  > 

i=l 


+2A;  ^  -Pn{*5'uru;i  ~  C’'uui  ^  ~^}- 

i=l  ^ 


From  above  we  observe  that  it  suffices  to  analyze  the  convergence  rates  of  the  fol¬ 
lowings  two  sequences: 


Pn{Sn,wi  -  (Tuui  <  Pn{|Ai  “  Ail  >  S^yji  -  <7„„i  >  (12) 

We  analyze  the  rate  of  convergence  under  the  following  conditions. 

3.1  When  The  a-th  Moment  Exists  {a  >  2) 

In  this  subsection  we  suppose  that  the  a-th  (a  >  2)  moments  of  (X,j,  Uij,  tij)  exist,  that 
is, 


<co,  F;il7i,-r  <  00,  i;|e,y|“<oo.  (13) 

We  will  show  that  the  expected  risk  of  the  proposed  selection  procedure  converges  to  0 
at  the  rate  of  o(n~(“/^~^^). 

We  introduce  some  useful  lemmas.  The  first  lemma  is  well  known,  a  similar  result  can 
be  found  in  Baum  and  Katz  (1965). 

Lemma  1.  Let  Wi, . . .  ,Xn  be  independent  random  variables  with  mean  0.  Suppose  for 
a  fixed  number  o:  >  1,  E\Xi\‘^  <  oo,  for  z  =  1, . . . ,  n,  then  for  any  e  >  0, 
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(14) 


As  a  consequence  of  Lemma  1,  we  have 

Lemma  2.  Let  X\^. . .  ,Xn  be  independent  random  variables,  with  mean  EXi  =  fj,  and 
variance  VarX,-  =  a^,  for  i  =  1, . . . ,  n.  Also  let  X  =  ^  ^  X,-  and  I](X,-  —  X)^. 

Suppose  for  i  =  1, . . . ,  n  and  a  fixed  number  a  >  2,  £|Xi|“  <  oo,  then  for  any  e  >  0, 


>  e}  =  o(n  1)).  (15) 

Proof. 


P{K  -  A  >  4 


71/  Tit  Z 

+P{|J?^  -  i 

,  Z  Tit 

+P{|X^-^^|>  J} 

:=  /1  +  /2, 


(16) 


(17) 


for  n  large  enough,  that  is,  when  n  >  max  (2,  [^]  +  1),  we  have  >  |  and  |  > 

From  Lemma  1,  we  have 


h=p{\\Y.ixi-{iP^tm>\} 

=  0(72““^^^“^) 


(18) 

(19) 


and 


6 


(20) 


=  P{\(X  +  ii){X-f^)\>^  and  (X  +  rt  >  (2/<  +  1)} 
+P{\{X  +  i,){X-^)\>^  and  (J?  +  ,.)  <  (2/1  +  1)) 
<  P{(X  - /i)  >  1}  +  P{4(2/i  +  l)p  -  ,1)1  >  e} 

From  Lemma  2,  we  can  see  that 

P{5w  - 

Moreover, 


=  P{S^^i  -  o-w  <  (21) 


Pnil/Su  -  Ad  >  <5, 


(22) 


=  Pn{\ 


q  . 

^wyi 


S -luyjx  ^  Jiiii 


Ad  >  S^yji  -  (7 uni  >  -p} 


<  Pn{\  Sruyi  -  -  CTuui )  |  >  ^  } 


n  i  ^-^1  n  -  1  ^ 


+Ai<T.«|  >  <^} 
■  i  n  —  i  0 


'n-1 
1 


n  —  1 


A.-W  -  -^^U<Tuui\  > 


u  —  1 


+/’.{i^AiE  +  ii...')l  > 

n  —  —  1  6 

A  +  A  +  A- 


For  any  z  =  1, . . . ,  fc,  {WijYij^j  =  1, . . . ,  n}  are  independent  random  variables  with  mean 
E[WijYij)  =  ^liCPxxi-  By  Holder’s  inequality, 
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E\WijYij\‘-l'‘  <  sjE\Wi,\’-E\Yij\-  <  CO, 


(23) 


therefore,  we  have 


Since 


=  Pn{\- >  —6^} 
n  n  b 


=  o{n 


(-a/2- 


(24) 

(25) 


mVi  -  /3iiW^  (26) 

= 

=  W,(/?ot  +  Ci  + 

=  iSoiWi  +  iiWi  +  l3uXiUi  +  PiiUf, 


we  observe  that 


J2  =  PnmiWi  +  elW-,-  +  ^uXiUi  +  -  -^u<Tuui\  >  — (27) 

n  no 

+Pn{|e.»'i|  > 

n  z4 

+PAViixm  > 

n  z4 

+PnmiUf  -  -iSli^Tuuil  >  — <5^} 
n  n  z4 

<  P.dAilVil  >  — «^} 

n  24 

+  />n{|S|  >  +  J’nd't'.l  > 

+P.{lv^x.|  >  +  P.dVfct?.!  > 

+P,{lA,(t//  -  i,T..i)|  >  —s^}. 

n  n  z4 
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Then  by  Lemma  1,  we  have 


n  24  [0,  if  j3oi  =  0, 


(29) 

P.{m\  >  \l~^}  =  o(n-<“-'»),  (30) 

Pn{\^fKA\  >  =  <>("-'“-■>),  (31) 

P.{]^^Km  >  =  o(n-«>-»),  (32) 

«.{l/3ii(P?  -  -<f..i)l  >  — (33) 

Tt  Tt 

<  P.mm  >  —6^  - 

n  24:  n 


=  P.{\^iU.\  > 

=  o(n-'“-''), 

^3  =  -P^l-Ai  E  -  Ai(<'««  +  <'.»)!  >  — «^)  (34) 

n  j-i  n  0 

Hence,  by  combining  the  above  arguments,  we  have  the  following  theorem. 
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Theorem  1.  The  selection  procedure  d„(w,  y),  defined  in  (8),  is  asymptotically  optimal 
with  convergence  rate  of  order  under  condition  (13).  That  is, 


=  o(n  (35) 

3.2  When  The  Moment  Generating  Function  Exists 

In  this  subsection  we  suppose  that  the  moment  generating  functions  of  {X?-, f/?-, 
exist  in  a  neighbourhood  of  the  origin,  that  is,  for  —T  <  t  <  T, 


<  oo,  <  oo,  Ee^^^j  <  oo.  (.36) 

where  T  is  a  positive  constant. 

We  first  introduce  the  following  lemma,  which  can  be  found  in  Petrov  (1995). 

Lemma  3.  Let  {Xi, . . . ,  Xn}  be  independent  random  variables  with  mean  EXi  =  0,  z  = 
1, . . . ,  n.  Suppose  there  exist  positive  constants  yi, . . . ,  ^„  and  T  such  that 


(37) 


for  —T  <t<T.  Let  G„  =  12"=!  gi,  then 


n 


i=l 


g-(xV2G„) 

e-(r^/2) 


if  0  <  x  <  GnT, 
if  X  >  GnT. 


(38) 


The  following  lemma  clarifies  the  probabilistic  meaning  of  the  conditions  of  Lemma  3. 

Lemma  4.  Let  X  be  a  random  variable  with  mean  EX  =  0.  The  following  two 
assertions  are  equivalent: 

(I)  There  exist  positive  constants  g  and  H  such  that 


Ee^^  <  for  -H<t<H,  (39) 

(II)  There  exists  a  positive  constant  T  such  that 
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<  oo 


for  -T  <t<T. 


(40) 


Proof.  It  is  clear  that  (I)  implies  (II).  We  now  prove  that  (II)  also  implies  (I).  If  (II)  is 
fulfilled,  then  the  random  variable  X  has  the  moments  of  all  orders,  and  the  following 
relation  holds: 


logEe^^  =  +  (41) 

as  t  — >■  0,  where  <7^  =  EX"^.  For  any  constant  g  >  cr^,  the  inequalities  log  Ee^^  <  gt'^12 
and  Ee^^  <  hold  for  all  sufficiently  small  t,  that  is,  (I)  is  true.  This  completes  the 
proof  of  Lemma  4.  As  we  can  see  in  the  proof,  we  can  always  set  g  =  2cr^. 

We  further  assume  that  the  4-th  moments  of  {X.-j,  17,j,  tij}  are  bounded,  that  is,  there 
exists  a  positive  constant  C  such  that 


EXt^  <  C,  EUt^  <  C,  Eetj  <  C.  (42) 

We  can  see  from  (42)  that  EW^j,  EY^  and  E{WijYij)‘^  are  all  bounded. 

We  analyze  first. 


Pn{Sy;^ui  -  CTuui  < 

To  ’_i  71/  ^ 

J=1 

>  I  -  M 

z  n 

^  i=l  ^ 

+p{\m  > 

■■=  Ki+I<2, 


(43) 


(44) 


for  n  large  enough,  that  is,  when  n  >  max  (2,  [^]  -f  1),  we  have  |  and  |  > 

f .  Since  for  ;  =  1, . . . ,  n,  E{Wij  -  =  0  and  for  -r/2  <  t  <  r/2, 
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(45) 

By  Lemma  3  and  Lemma  4,  we  have 

K,=P{\\'t(Wfj-,T^^i)\>\] 

”  j=l  ^ 

r  e-{n^^V32G„)^  if  g  <  2TGn/n, 

-  1  g-(Te/8)„^  if  g  >  2TGJn, 

(46) 

where  (?„  is  twice  the  sum  of  the  n  variances  of  —  <Tu,^,),  j  =  l,...,n. 

{EWij,j  =  1, . . . ,  n)  are  bounded,  Gn  =  (9(n).  Therefore, 

Since 

i=i  ^ 

r  e-(n2£2/32Gn)^  if  g  <  2TGnln, 

-  1  g-(T./8)n^  if  g  >  2TGJn, 

(47) 

where  is  a  positive  constant.  Similarly,  for  —T  <t<T, 

<  oo. 

(48) 

K^=p[\m>^j\} 

(49) 

=  0(e“'Ki*), 

where  is  also  a  positive  constant. 

Next  we  consider  Pn{\^\i  —  ^\i\  >  Swwi  -  <^uui  >  We  have 

PniWu  -  A.I  >  S,  -  a^ui  > 

(50) 

12 

<  -P-d^  E 

n  —  1  ^  n  —  i  0 

j=i 

+Pn{\-^WiYi  -  -  -^I3u(ruui\  >  S^} 

n  —  I  n  —  1  n  —  1  6 

+-fn{| — - +  ^uui)\  > 

77'X^*_^^  71/1.  0 

:=  X-i  +  L2  +  -i'3- 

For  any  i  =  l,...,k,  {WijYij,j  =  l,...,n}  are  independently  but  not  necessarily 
identically  distributed  random  variables.  By  Cauchy-Schwarz’s  inequality,  we  have,  for 
-T/2  <  t  <  T/2, 


Besides,  for  each  i  and  j,  the  variance  of  WijYij  is  bounded,  therefore. 


=  P.d-Dw'.ji^v  - 

n  n  D 

where  c^j  is  a  positive  constant.  Next  we  analyze  L2  and  L3.  Similarly, 


L2  <  Pnmm  >  —s^} 

n  z4 

+p.{i5i  >  + p^m\  > 


+p,{\4hA\  >  + p~{i 

+p,mi(uf  - 

n  n  z4 


+Pn{\y/KiX,\  >  V' >  V 


i^3  =  Pn{\-I3u  E  if;?  -  +  a^ui)\  >  —6^}  (56) 

n  n  b 

=  0(e-=^3"),  (57) 
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where  ^i2  and  are  positive  constants.  Hence,  by  the  above  argument,  if  we  set 
c*  =  niin(c|('j,cJf2,c2j,C£j,c23),  then  c*  >  0.  We  have  the  following  theorem. 

Theorem  2.  The  selection  procedure  d„(w,y),  defined  in  (8),  is  asymptotically  optimal 
with  convergence  rate  of  order  under  conditions  (36)  and  (42).  That  is, 


E^'^'y-^L{§,dn{w,y))  =  0(e-=*").  (58) 

We  consider  two  special  situations  next. 

Two  special  situations. 

1.  {{Xij,Uij,eij),l  <  j  <  n}  are  normally  distributed.  In  this  case,  {iXij,Uij,€ij)} 
are  i.i.d.  A^3((0, 0, 0), diag(crj:j;i, <Te«)).  Since  [X'^jj cr^xi^UfJ are  dis¬ 
tributed  and  the  moment  generating  functions  of  them  exist  in  a  neighbourhood  of  0, 
and  the  4-th  moments  of  {{Xij,  U{j,  e,j)}  are  also  bounded,  by  Theorem  2,  we  have  that, 
in  the  normal  case,  the  selection  procedure  d„(w,y),  defined  in  (8),  is  asymptotically 
optimal  with  the  rate  of  convergence  of  order  0(6"“^*"). 

2.  {{Xij,Uij,tij)^l  <  j  <  n}  are  bounded.  Then  conditions  (36)  and  (42)  always  hold 
and  therefore,  the  selection  procedure  d„(w,y)  is  asymptotically  optimal  with  conver¬ 
gence  rate  of  order  0(e"°*"). 


4  Simulations 


We  carried  out  a  simulation  study  to  investigate  the  preformance  of  the  selection  proce¬ 
dure  The  expected  risk  w,  y))  is  used  as  measure  of  the  performance 

of  the  selection  rule.  For  any  observations  (W,  Y),  let 


B(w.Y)  =  {J; 


if  we  make  a  wrong  selection, 
if  we  make  a  correct  selection. 


(69) 


Then,  by  the  law  of  large  numbers,  the  sample  mean  of  D(W,Y),  based  on  our 
observations,  can  be  used  as  an  estimator  of  the  expected  risk  E^^'^^L{j3,dn{'Wiy))- 

The  simulation  scheme  is  described  as  follows: 

1.  For  each  j  =  l,...,n  and  each  i=l,  2  and  3,  we  generated  independent  random 
observations  (Xij,  Uij^  tij)  from  multivariate  normal  A^3((0, 0, 0)^,  diag{a-xxi',  (Tuni,  crcd)). 

2.  Let  Wij  =  Xij  -t-  Uij  and  Yij  =  ^o»  +  0uXij  -I-  e^. 
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3.  Based  on  {Wij,Yij),  we  obtained  the  estimators  of  then  made  the  selection  and 
computed  D(W,  Y)- 

4.  Step  1,  2  and  3  were  repeated  4000  times.  The  average  of  D(W,Y)  based  on  the 
4000  repetitions,  which  is  denoted  by  Dm  is  used  as  an  estimator  of  the  expected  risk 
E^^’^^T(;d,d„(w,y)). 

The  results  are  listed  for  the  case  where 


^  xxl  —  ^  xx2  —  ^ xxZ  —  1-5 
(^uul  —  ^uu2  —  ~  I? 

<7‘££l  =  Cr££2  =  <7'££3  =  0.25, 
/?01  =  /?02  =  /5o3  =  0, 
^11  =  0,^12  =  l,,di3  =  2. 


and 


n  =  5,10,15,20,30,40,50,100. 


From  the  results  of  the  simulation  (see  the  last  page),  we  can  observe  that  the  values 
of  Dn  decrease  quite  rapidly  as  n  increases,  for  n  <  100.  This  supports  Theorem  2  that 
the  rate  of  convergence  is  of  order 
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